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PREFACE 



This technical note presents a number of techniques for promoting 
vectorization in FORTRAN programs to be run on the Cray Research 
CRAY-1 Computer System. Because the CRAY-1 FORTRAN Compiler (CFT) is 
continually being refined, this note will be updated periodically. 
With each update, I hope to increase the number of examples and expand 
it in a few other ways to improve its usefulness. 

I welcome any material that might be included in future editions, such 
as more examples and coding techniques. I especially solicit help 
with Appendix B where many errors of both omission and comission 
undoubtedly lie. In particular, the various table positions that are 
blank indicate that I don't know the proper entry. Special thanks are 
due to the several people who sent me suggestions and examples. In 
particular, a great deal of help for this revision was provided by 
Dick Hendrickson. 

LCH 
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SECTION 1 
INTRODUCTION 



This note describes techniques for helping to vectorize codes written 
for the CRAY-1 Computer and the CRAY-1 FORTRAN, CFT, Compiler System. 
Since the CFT Compiler is continually being refined, some of these 
techniques will become unnecessary. It is primarily intended to aid 
programmers vectorizing existing codes but should aid many programmers 
who are writing new codes to generate vector izable loops. 

Before going further, a caveat is in order. This note presents 
techniques for enhancing the vector izability of codes only; it 
addresses few other methods of increasing program speed. Algorithm 
selection is generally far more important than coding techniques. For 
example, the FFT and various good sorting algorithms are approximately 
N/logN times as fast as the typical simplistic algorithms to perform 
the same tasks (for N input data) , whereas the vector izat ion usually 
increases speed by a factor of 3 to 6. Thus, for typical dataset 
sizes, these best algorithms or other nearly optimal ones, are orders 
of magnitude faster than poor algorithms. No fancy coding techniques 
can overcome the use of ill chosen algorithms in such cases. Indeed, 
a good algorithm poorly coded is usually preferable to a poor one 
optimally coded. 

Also, this note does not address to -any great extent good programming 
practices which for CFT include (1) using few loops with long code 
blocks in preference to many short code loops} (2) judicious use of 
typing of variables* (3) long loops inside short loops rather than 
vice versa t and (4) if you are trying to get the last little bit from 
a vectorized loop, inserting extra parentheses starting at the end of 
an expression so that operations occur in an order that increases 
chaining. The techniques that are described are presented in six 
groups comprising the remaining sections of this note. 

Sections 2 and 3 present the central issues that must be resolved 
before any useful work can be done, namely (1) finding the time 
consuming portions of the program and (2) circumventing overly modular 
or structured programming techniques. These tasks are so fundamental 
that compiler improvements are unlikely to be of much aid in the 
forseeable future without programmer help in these areas. 

Section 4 discusses recursion, feedback, or vector dependency and 
introduces a simple but very useful and powerful technique; using 
directives which allow the programmer to indicate to the compiler that 
an individual block of code is logically vector izable. Frequently, 
although the programmer knows from the physics of a situation that the 
code is vectorizable, coding in a form that allows the compiler to see 
this may not be convenient. Directives provide a means of pointing 
out vectorizable code to the compiler when this happens. 
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Section 5 shows one way to partially vectorize codes with irregular 
addressing—another anathema of vector izat ion, but one that the 
compiler will work around before too long. 

Section 6 is a pot pourri of "tricks" to improve vectorization that do 
not readily fit into any of the earlier discussions. 

The final section describes removing IF statements, a syntactic 
construct that the compiler will soon handle, at least in some cases. 

Thus, Section 5 and 7 should be of less interest to programmers who 
are not in a big hurry to get the highest speed from their codes. 

As a general note, CFT vectorizes innermost DO-loops only* it does not 
vectorize IF loops. Table 1 lists typical snytactic elements that may 
inhibit vectorization. Except for I/O statements, which often can be 
moved outside of loops after the debug phase, I discuss the more 
difficult of these constructs and how the programmer can remove these 
so that CFT will vectorize loops. Although not vectorized in the 
usual sense, unformatted I/O statements which involve arrays are 
processed With vector techniques. 

CFT 1.06, The July 1979 release of CFT, vectorizes loops, with 
constructs in the easy group (table i) and allows scalar temporaries 
and user-provided (but CAL) functions from the second group. However, 
it inhibits vectorization for a -loop containing other constructs in 
the second group or constructs in the third or fourth group. The 
third group includes constructs that are theoretically vector izable 
but present challenges to the compiler writers. The items in the 
fourth group present a theoretical impossibility so that the only real 
hope for vectorizing loops containing them in the near future is to 
break the loop into several loops with the "impossible" construct in a 
separate (scalar) loop. If loops can be recast so that the inner 00 
loops include only items in the first categories, the chance of 
vectorization is enhanced and such loops that do not vectorize with 
CFT 1.06 are more likely to vectorize in the near future. 
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Table 1. FORTRAN inner DO-loop constructs 



Difficulty 



Easy 



Straightforward 



Difficult 



'Impossible' 



Syntactic constructs 

- Long or complicated loops 

- Non unit incrementing of subscripts 

- Expressions in subscript 

- Intrinsic function references 



- Scalar temporary variables 

- Function calls to 
programmer-supplied functions 

- Inner products 

- Logical IF statements 

- Transfer out of a loop (search 
loop) 

- Reduction operations 



Linear recursion 

IF statements 

Some I/O 

Complicated subscript expressions 



Nonlinear indexing 

Complicated branching within a loop 

Ambiguous subscripting 

Transfers into a loop 

Subroutine calls 

Nonlinear recursion 

Some I/O 
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SECTION 2 
FINDING THE CENTRAL PORTION OP THE PROGRAM 

Before spending your time vctorizing parts of the program that do not 
significantly affect the run time, first analyze the program to 
determine where it spends its time. This information may be readily 
available if the code is simple or if there is someone available who 
is familiar with it. Suppose, however, that this is the first time 
you've been faced with this task and that you have never seen the 
program before. 



A typical, well behaved or "nice" 
program has a structure similar 
to that illustrated at right. 
With any luck, you will be able 
to find a similar pattern in 
your program and will be able to 
concentrate on the inner points 
of the program where your efforts 
will significantly impact the 
program's run time. 



INITIALIZATION 



c- 



BOUNDARY POINTS 
INNER POINTS 
DONE? AND I/O 



NEXT CASE? 



If you question the worth of this, look at a typical program and you 
are likely to see many simple vector loops that have been there all 
along. The trouble is, they intialize the grid and are not used for 
any of the computations I Since a problem with a grid of 100 points on 
a side has about 400 boundary points and about 10,000 interior points, 
working on the interior points and ignoring the boundary points — let 
alone the initialization — is clearly worthwhile. Even entirely 
removing the code for the boundary points leaves more than 96% of the 
points in the grid. 

If you cannot discern the general structure, a worthwhile procedure is 
to use the flow analysis option in CFT to get a complete list of the 
subroutine calling tree and the time spent in each of the called 
routines. These figures tell you which routines consume the largest 
fractions of the time, thus it tells which routines are worth looking 
at. Refer to CFT Reference Manual, section 5, for a description of 
the Flow trace option. 
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The flowtrace option adds a substantial overhead to every subroutine 
call and its output is lost if a job fails or executes a CALL EXIT.* 
Thus, if you have a program with many small subroutines, it is 
worthwhile to flowtrace a small case, at least for starters. You 
might also put a test such as the following in your code to stop the 
job after a reasonable length of time: 

IF (SECOND ( ) .GT. 50 ) STOP** 

TO use the flowtrace option (CFT Manual section 5.4.5), put 0N»F on 
the CFT statements 

CFT,ON»F,... . 

At the end of the run, you will get a table listing the time, percent 
of total time, number of times entered, and average time for each 
routine that is called as well as what routines called it and what 
routines it called. Only calls to FORTRAN programs that are compiled 
by the CFT,0N«F... statement are monitored?, $FTLIB, SSCILIB, SSYSLIB, 
and CAL routines are not monitored nor are the FORTRAN routines 
compiled separately without flowtrace enabled. Because of the great 
difference in execution speed of vectorized code compared to 
non-vectorized code, use of flowtrace is recommended even if you are 
familiar with a program. The timing analysis of flowtrace is 
frequently surprising. 



*EXIT might be the name of one of your routines. Thus, the system 
cannot automatically assume that EXIT terminates a program. If you 
use EXIT to terminate your program, you can still use flowtrace by 
inserting the following subprogram in your deck. 



SUBROUTINE EXIT 
STOP 'EXIT' 
END 



**This is one of many examples of non-standard FORTRAN employed in 
this note. CFT accepts all FORTRAN shown in the examples here. 
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SECTION 3 
GETTING AROUND OVERLY MODULAR OR 
STRUCTURED PROGRAMS 

Assume your code looks like thiss 

DO 31 I - 2,99 
DO 31 J .- 2,99 

CENTPT - DATA (I, J) 

PTLEFT =■ DATA(I-1,J) 

PTRGHT » DATA(I+1,J) 

TEMP = TEMPURTR(I,J) 

TEMPRT » TEMPURTR(I+1,J) 

TEMPLET = TEMPURTR(I-1,J) 

CALL INTGRTE (CENTPT , PTLEFT , PTRGHT , TEMPRT , TEMPLFT ) 

CALL EQNOST (CENTPT, TEMP) 

DATA (I, J) = CENTPT 

TEMPURTR(I,J)= TEMP 

31 CONTINUE 

Your first impulse probably is to put this away until there are global 
FORTRAN compiler that vectorize messes like this. However, this 
represents a very common situation and is not nearly as hopeless as it 
first appears. However it is hopeless for CFT thus, it is your chore 
to put the DO loops inside the subroutine or, conversely, the 
subroutines inside the DO loops. Putting DO loops inside subroutines 
or the converse operation is probably the most complex part of 
vectorizing codes and is the part that is most likely to be beneficial 
in the long run. The other techniques discussed in this note are more 
likely to be handled by the compiler or by a vector izer someday. 

Putting DO loops inside subroutines probably entails subscripting the 
variable names being passed to the subroutines and passing the entire 
arrays at once. Putting the subroutine in the loop means expanding 
the subroutine code in line in the loop. 

In the above loops, this can be done easily. Perhaps in your problem 
the surface is not flat but is a sphere and so the right and left 
points wrap around at the ends causing nonlinear indexing. Then, you 
will have to try to separate the "bad points" and perhaps use some of 
the techniques suggested in later sections. 



2240207 3-1 



To illustrate putting DO loops into subroutines and vice versa, 
suppose the subroutines are as follows: 

SUBROUTINE INTGRTE(C,PL,PR,TL,TR) 

COMMON DELTAX, DELTAT, GAMMAI,V,R 

C a c + DELTAT *0.5 * (PL+PR) *DELTAX/ (TR-TL) 

RETURN 

END 

SUBROUTINE BQNOST(P,T,R) 

COMMON DELTAX, DELTAT, GAMMAI,V,R 

TP » (P/ (V*R) ) **GAMMAI 

RETURN 

END 

Then the two rewrites of the loop look like this: 

Case 1. Putting loops inside Subroutines. 

The entire DO 31 loop pair replaced by: 

CALL INTGRTV (DATA,TEMPURTR) 

CALL EQNOSTV (DATA, TEMPURTR) 

Where INTGRTV is a vector version of INTRTE and EQNOSTV is a 
vectorized EQNOST. 

Then , these new subroutines are : * 

SUBROUTINE INTGRTV (D,T) 
DIMENSION D(100,100), T(100,100) 
COMMON DELTAX, DELTAT, GAMMAIV, R 
DO 32 I»2,99 
DO 32 J-2,99 

D (I,J) » D{I,J) + DELTAT*0.5*(D(I-1,J)+D(I+1,J)J 
*DELTAX/(T(I+1,J) - T(I-1,J)) 
32 CONTINUE 
RETURN 
END 



♦Reversing the order of the I and J loops would cause an 
unvectorizable dependency; see section 4. 
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33 



SUBROUTINE EQNOSTV (P f T) 

DIMENSION P(100,100), T(100 r 100) 

DO 33 1=2,99 

DO 33 J=2,99 

T(I,J) = (P(I,J)/(V*R))**GAMMAI 

CONTINUE 

RETURN 

END 



Case 2: Putting the subroutines inside the loops. 

In this case, the DO 31 pair of loop becomes 

DO 34 1=2,99 
DO 34 J=2,99 

DATA (I, J) = DATA (I, J) + DELTAT *0.5* (DATA(I+1,J) 
$ +DATA { 1-1 , J ) ) *DELTAX/ ( TEMPURTR ( 1+1 , J ) 

TEMPURTR (I , J) = (DATA {I , J) / (V*R) ) **GAMMAI 



34 



CONTINUE 

RETURN 

END 



The second alternative is also especially suitable for functions, 
i.e., expand the code in line as in the next examples 



35 



DO 35 I = 1,1000 

X(I) = ALOG2(Y(I)+l)... 



FUNCTION ALOG2 (X) 
DATA CST /.../ 
ALOG2 = CST*ALOG(X) 
RETURN 
END 



36 



DO 36 I = 1,1000 

X(I) - CST*ALOG(Y(I)+l) ... 
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The DO 35 loop will not vectorize because of the call to a routine 
that the compiler doesn't recognize but the DO 36 loop will vectorize. 

One common coding technique is to use vector subroutines such as VADD, 
VMULT, and so on. The principal part of the program may then look 
like thiss 



CALL VADD (A, B, C,M) 
CALL VMOLT(C,A,E,N) 
CALL VADD<E,B,A,N) 



Expanding these subroutines- in line and, where possible, combining the 
many DO loops into a few will ensure vector ization and will allow 
intermediate variables to be held in registers rather than being 
returned to memory x 

DO 37 I - 1,M 

A(I) » (B(I) + A(I)) * A{I) + B(I) 
37 CONTINUE 

Presumably, the VADD, VMDLT, etc. vectorize but the DO 37 loop is 
faster because the sum A + B and the product (A + B) * A do not have 
to be stored, but can be kept in a register and A does not have to be 
fetched a second time. Thus, the DO 34 loop is significantly faster 
than the series of calls. 
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SECTION 4 

RECURSION AND DIRECTING THE 

COMPILER TO VECTORIZE 

Suppose the key inner loop in your program is like the DO 41 inner loop, 
which doesn't vectorize and is therefore one that you want to spend some 
time on. 

DO 41 I ■ 1,100 

A(I) ~ A(I+L) ... 

41 CONTINUE 

If the loop is truly recursive*, the situation may be hopeless. 
However, if the value of L is such that there is no recursion (e.g., if 
L is greater than 1000), the easiest approach is to try directing the 
compiler to vectorize the loop and see if the answers remain the same. 
Placing the following compiler directive in front of the DO loop to be 
vectorized allows the compiler to vectorize a loop that has an apparent 
vector dependency or recursions 

CDIR$ IVDEP (see CFT manual sections 5.4, 5.4.3) 

In other words, if either real or imagined recursion causes the loop 
not to be automatically vectorized by the compiler, the IVDEP compiler 
directive causes the computations to be done in vector mode. Note, 
however, that if CALL or IF statements or anything besides or in 
addition to apparent recursion prevents vector izat ion, CDIR$ IVDEP has 
no effect. Also, the effect of" the IVDEP is limited to only the next DO 
loop; a separate IVDEP must be provided for each loop with an ignorable 
dependency* and the IVDEP should immediately precede the DO statement. 

Returning to the example, first try printing some of the A terms the 
first few times through the vectorized loop to assure that vectorizing 
the loop does not change the results. Though this hardly proves that no 
problems can arise, it may help your analysis. This brute force 
approach is inelegant and error -prone, especially in those cases where 
one's insight into the physics of the situation does not provide some 
assurance that the loop is recursion-free. If the value of L differs 
with each pass through the loop, you may find it useful to make a copy 
of the loop with the compiler directive to vectorize, ignoring vector 
dependencies and a copy of the loop without the directive and select for 
use the vector izable block only when it is correct. This means that you 
need to know what values of L are acceptable for vector ization of which 
loops . 



'Recursion is a buzzword used to describe the case where output is 
propagated back into the input. It is explained further below. 
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Some rules of thumb to follow are the following: 



1. Tf the sign of L is the same as the sign of the loop 
increment, there is never any recursion. 

2. tf the sign of L is the opposite of the loop index, 
there is probably recursion. An exception is when 
the loop increment and L have a least common multiple 
larger than the maximum value of the loop index. 

3. There is no recursion of concern if the magnitude of L is 
such that there is no overlap of subscripts between the 
right and left sides of the computation. In fact, if L 
divided by the loop increment is greater than 64, you have 
no worries because the compiler breaks loops into 
64-at-a-time blocks for vector ization. 

If these simple rules do not help, you may have to analyze the problem 
further to determine when you can safely use vector computations. 

"Recursion" is a mathematical term used to describe feedback, a noun you 
may find more familiar and easier to remember. The phenomenon referred 
to is the use of the output of one pass through the loop for the input 
to a computation on a subsequent pass. Consider the following simple 
examples s 



42 



DO 42 I - 1,1000 
X(I) - X(I - 1) + 1 



SUM » 

DO 43 I » 1,1000 

43 SUM » SUM + X(I) * Y(I) 



In the code on the left, the value of X(l) is used to compute X(2)j X(2) 
is used to compute X(3) , and so on. If this were done in vector mode, 
all of the X terms would be fetched at once, 1 would be added to each of 
them, and only the first value would be known to be correct. In the 
code on the right, the value of SUM is used for each subsequent pass 
through the loop. Inserting a CDIHS IVDEP would probably produce wrong 
answers in the DO 42 loop and would have no affect on the DO 43 loop 
because the reason for scalar mode of the DO 43 loop is loop collapsing, 
as well as recursion. 

The following loops are similar to these but are nonrecursive: 



44 



DO 44 I = 1,1000 
X(I) » X(I + 1) + 1 



45 



DO 45 I » 1,1000 

A(I) » A(I) + X(I) * Y(I) 



In the DO 44 loop, no X value is reused after being computed so there is 
no feedback. Similarly, the DO 45 loop is not recursive because no A 
value is reused after being generated; it is merely stored. 
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To understand the effects of recursion on vector ization, it is important 
to realize that vector ization is an essentially parallel computation on 
a group of values. Consider the simple cases 

DO 46 I ■ 2,3 
A(I-l) = 3.0 
46 B (I) = A(I) 

This loop is equivalent to the sequential statements. 

A(l) = 3.0 

B(2) = A(2) 

A(2) = 3.0 

B(3) = A(3) 

Vector ization, in effect, reorders the sequence to: 

A(l) = 3.0 

A(2) = 3.0 

B(2) = A(2) (Now 3.0) 

B(3) = A(3) 



and the "vectorized" sequence probably produces different results. 

Whenever CFT encounters a loop which might be recursive, it generates 
correct scalar code rather than fast and possibly incorrect vector code, 
because vector and scalar versions of a recursive loop generally produce 
different results. 

Recursion can cause problems if numerically equal subscript values occur 
on different passes through a DO loop and at least one of them is on the 
left of the equal sign. 

There are two general classes of recursion: 

1. A value is prematurely destroyed if vectorized. The preceding 
is an example of this. The loop can often be made 
non-recursive by reordering the statements or by using 
temporary storage. 



47 



DO 47 I = 2,3 
B (I) - A (I) 
A(I-l) = 3.0 



48 



DO 48 I = 2,3 
TEMP = A(I) 
A(I-l) = 3.0 
B(I) = TEMP 
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2. A value is not ready when needed. This is the one-line 
recursion relationship: 

DO 49 I » 2,3 
49 A(I) a B*A(I-1)+C 

Because of the group computation in vector mode, both input A values 
(the group A(l), A(2)) are used to compute the output value group A(2) 
and A (3). However, in this case the original A (2), not B*A(1)+C, is 
used to compute A (3), a probable error. 

In many cases, CFT is not able to determine whether or not a subscript 
leads to recursion. For examples 

DO 410 I - 1,10 

410 A(I,J) » A(I-1, JPLOS1) 

is recursive if J is ever the same as JPLtJSl. If the programmer knows 
from the physics of the situation, for example, that J. and JPLUS1 will 
never be the same , then our ' 

COIRS IVDEP 
is appropriate. Alternatively, the loop could be rewritten as 

DO 411 I « 1,3 ' 

411 A(I,J) - A(I-1,J+1) 

and CFT would automatically vectorize it. In general, it is an aid to 
vector izat ion if subscripts can be explicity written out. For examples 

DO 412 I » 1,3 

412 A(I) - A(I+N) 

In many cases, N is not really a "variable"* it has a constant value and 
often never even changes from run to run. A "variable" is used simply 
to provide some flexibility in case the problem ever changes. Rather 
than initialize N with 

DATA N/3/ 
or N s> 3 
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it is much better to use: 

PARAMETER (N = 3) 

and CFT would automatically vectorize the sample loop. 

In the following illustrations of recursive and non-recursive loops, 
assume that X(I) = 21, Y(I) = -I, and Z(I) =0 before the codes are 
run. The final values of X are given after the loop for subscripts » 



Recursive : 

DO 413 I = 2,5 
X(I) - X(I - 1) + 1. 
413 CONTINUE 

X = 2, 3, 4, 5, 6 



Non-recursive : 

DO 415 I = 2,5 
X(I) = Y(I - 1) +1. 

415 CONTINUE 

X = 2, 0, -1, -2, -3 
DO 416 I - 1,5 

X(I) = X(I) + 1. 

416 CONTINUE 

X = 3, 5, 7, 9, 11 

DO 417 I = 1,5 

X(I) = Z(I) + X(I) * Y(I) 

417 CONTINUE 

X = -2, -8, -18, -32, -50 



CDIR$ IVDEP 

DO 414 I = 2,5 

X(I) = X(I - 1) + 1. 

414 CONTINUE 

X = 2, 3, 5, 7, 9 



the last four of which are bad 

values because of forced 

vector izat ion of a recursive loop. 



The compiler vectorizes the 
DO 415 loop automatically. 



The compiler vectorizes the 
DO 416 loop automatically. 



The compiler vectorizes the 
DO 417 loop automatically. 
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413 



L - 10 

DO 418 I » 1,5 

X(I) » X(I + L) 

CONTINUE 

X =» 22, 24, 26, 28, 30 



L ■ 10 
CDIK$ IVDEP 

DO 419 I » 1,5 
X(I) ■ X(I + L) 
419 CONTINUE 

X » 22, 24, 26, 28, 30 



Here, the 418 loop does not vectorize but the 419 loop does. The 
compiler does not know that L is not negative. 



Recursive: 

L - -1 



L » -1 



DO 420 I - 1,5 


CDIR$ IVDEP 


X(I) - X(I + L) 


DO 421 I » 1,5 


CONTINUE 


X(I) » X(I + L) 


X » 0, 0, 0, 0, 


421 CONTINUE 




X » 0, 2, 4, 6, 8 



420 



The last four are incorrect 
because of forced -vector iza- 
tion of a recursive loop. . 

Here, the value of X(l) is fed back to compute X(2), i.e., the loop is 
recursive and the vectorized version of the loop produces wrong 
results. In scalar mode, the computations proceed... 

(-0 by assumption) 
(=»0 from last computation) 
(*0 from last computation) 
(=0 from last computation) 
(=0 from last computation) 



X(l) 


» X(0) 


X(2) 


- X(l) 


X(3) 


- X(2) 


X(4) 


* X(3) 


X(5) 


» X(4) 



2240207 



4-6 



CFT generates code that executes as above because its approach to 
vector izat ion is conservative. When forced to vectorize, the loop 
executes : 

X = shifted X= (0, 2, 4, 6, 8) 

so that X(2) = original value of X{1), not the just-computed value* 
similarly, X(3) - original X(2), not the newly computed value, etc. 
Also, in this example it is assumed that is a legal subscript, i.e., X 
is declared DIMENSION X(0:50). 

Many examples of recursion are of the following form where L is negative 
(that is, opposite in sign to the increment of J, which is 1 here) and K 
is positive (of the same sign as the increment of I in this example) : 

DO 422 I - 1,100 
DO 422 J = 1,100 
422 A(I,J) = A(I + K, J + L) ... 



Here, by inverting the order of the loops, you can remove the recursion 
and allow vector ization by using the CDIR$ IVDEP directive. This type 
of loop order inversion is frequently too complex to analyze easily and 
you may need to go back to the physics of the situation or to that 
unfortunate alternative of printing gobs of values to determine a 
reasonable way to reorder or rewrite the code. 

The following examples show a simple but real case where the compiler's 
overly conservative attitude is easy to see and correct. Case 1 runs 
about four times slower than Case 2. The cause of such a large speed 
increase is the complexity of the loop. Loops with very few 
computations generally have less speed up. 
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CASE 1 



NL1 » 1 
NL2 » 2 



DO 423 KX - 2,3 

DO 423 KY » 2,21 

DOl » U1(KX,KY + l,NLl) - ai(KX,K3f - 1,NL1) 

002 » 02(KX,X? + 1,NL1) - 02(KX,KY - 1,NL1) 

DU3 - U3(XX,RJr + 1,NL1) - 03{KX,IK - 1,NL1) 

01(KX,KY,NL2) - D1(KX,KY,NL1)+A11*DD1+A12*D02+A13*D03 

$ +SIG*(Ul(KX-H,KY,NU)-2.*ai(KX,inr f SLl)+Ul{IDC-l,KY,NLl)) 

a2(KX,K?,NL2) » 02(KX,inr,NLl)+A21*DDl+A22*DO2+A23*DU3 

$ +SIG*(U2(KX+l,KY,NlU.)-2.*02(KXjKY,NlJ.)-KI2(kX-l,irir,NLl)) 

U3(KX,KY,NL2) » U3 (KX,KY,NLl)+A31*DOl+A32*D02+A33*D03 

$ +SIG*(n3(KX+l,KY,NLl)-2.*03(KX,Ky,NLl)+a3(KX-l,Ky f NH)) 

423 CONTINUE 



The values of NL1 and NL2 are swapped before the next pass through loop. 
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CASE 2 



DO 424 KX = 2,3 

CDIR$ IVDEP 

DO 424 KY » 2,21 

DU1 - U1(KX,KY + 1,NL1) - Ul(KX,KY - 1,NL1) 

DU2 - U2(KX,KY + 1,NL1) - 02(KX,KY - 1,NL1) 

D03 = D3(KC,KY + 1,NL1) - 03(KX,KY - 1,NL1) 

U1(KX,KY,NL2) = 01 (KX,KY,NL1)+A11*DU1+A12*DU2+A13*DU3 

$ +SIG*(Dl(KX+l,KY,NLl)-2.*Ul(KX,KY,NLl)+al(KX-l,KY,NLl)) 
U2(KX,KY,NL2) - U2 <KX,KY,NL1)+A21*DU1+A22*DU2+A23*DU3 

$ +SIG*(U2(KX+l,KY f NU)-2.*02(KX,KY,NLl)+02(KX-l,KY,NLl)) 
D3(KX,KY,NL2) = U3 (KX,KY,NLl)+A31*DUl+A32*DU2+A33*DU3 . 

$ +SIG*(U3(KX+1,KY,NL1)-2.*U3(KX,KY,NL1)+U3{KX-1,KY,NL1)) 
424 CONTINUE 



I hope these examples shed some light on this rather abstruse topic. 
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SECTION 5 

IRREGULAR ADDRESSING 

Irregular or nonlinear addressing arises in situations such as those 
using data structures requiring subscripted subscripts. Subscripted 
subscripts do not occur explicitly in FORTRAN-66 code but may 
effectively occur in certain types of programs as below: 

In the DO 51 loop, Y essentially has a subscripted subscript 

DO 51 I ■ 1,1000 
J = INDEX (I) 
X(I) = Y(J) ... 

51 CONTINUE 

Change to: 

DO 52 I = 1,1000 
J = INDEX (I) 

52 TEMP(I) = Y(J) 
DO 53 I = 1,1000 

53 X(I) - TEMP(I) ... 

The DO 51 loop cannot vectorize with CFT 1.06 because of the nonlinear 
indexing. The DO 52 loop similarly doesn't vctorize but the DO 53 loop 
does and, if the computations are extensive, the speed-up can be 
dramatic. 

In general, if the computations are sufficiently complicated to warrant 
the work, you can restructure the loop into two or three loops. The 
first new loop is a GATHER loop in which all the data to be manipulated 
are collected into vectors. Next is the computation loop. Then is the 
SCATTER loop, in which results are distributed from the vector used in 
the computation loop to their proper locations. Quite often, as in this 
example, there is no SCATTER loop. There are $SCII»IB routines for doing 
the GATHER and SCATTER (see Appendix A and CRI Manual 2240014) . 

The following example illustrates this again for a particle pushing 
algorithm. 
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CASE 1 

DO 54 K » 1,150 
IX » GRD(K) 
XI - IX 

VX(K) » VX(K) + EX(IX) + (XX(K) - XI) * DEX(IX) 

XX (K) » XX (K) + VX(K) + PLX 
C IX IS, IN EFFECT, A SUBSCRIPTED SUBSCRIPT 

IR - XX (K) 

RI » IR 

RX1 » XX (K) - RI 

IR » IR - (IR/64) * 64 

XX (K) » RI + RX1 
C IR IS AN IRREGULAR SUBSCRIPT 

RH(IR) - HH(IR) + 1.0 - RX1 

RH(IR + 1} » RH(IR +1) + RX1 
54 CONTINUE 
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CASE 2 

DO 55 K = 1,150 
IX = GRD(K) 
XIV (K) = IX 
EXC(K) = EX (IX) 
DEXC(K) = DEX(IX) 

55 CONTINUE 

C GATHER LOOP ABOVE 
C XI IS VECTORIZED INTO XIV 
C EX IS GATHERED INTO EXC 
C DEX IS GATHERED INTO DEXC 

DO 56 K = 1,150 

VX(K) = VX(K) + EXC(K) + (XX (K) - XIV (K) ) * DEXC(K) 

XX (K) = XX (K) + VX(K) + FLX 

IRV(K) = XX(K) 

RI = IRV(K) 

RXlV(K) = XX (K) - RI 

XX (K) - RI + RX1V(K) 

56 CONTINUE 

C COMPUTATION LOOP WHICH VECTORIZES IS ABOVE 
DO 57 K = 1,150 

RH(IRV(K)) = RH(IRV(K)) + 1.0 - RXlV(K) 
RH(IRV(K) + 1) = RH(IRV(K) + 1) + RX1V(K) 

57 CONTINUE 

C SCATTER LOOP 

The code in Case 2 runs more than twice as fast as that in Case 1. 
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SECTION 6 

MISCELLANEOUS TECHNIQUES 

This section includes a group of examples that do not readily fit into 
the categories discussed above. In some sense, this is a 
bag-of- tricks chapter demonstrating several additional loop 
restructuring techniques as well as all multi-loop techniques. The 
techniques here are harder to describe in a general and systematic 
fashion. 

A matrix multiply represents an algorithm that can benfit from loop 
restructuring. For example, the following code illustrates the common 
way of coding the matrix multiply: 

DO 61 I = 1,L 
DO 61 J = 1,M 
C(I,J) » 0.0 
DO 61 K = 1,N 

61 C(I,J) = C(I,J) + A(I,K) * B(K,J) 

The recursion on C(I,J) and loop collapsing prevent vectorization now 
(CFT 1.06) and will always prevent as full vectorization as the rewrite 
below. This rewritten code vectorizes fully, resulting in a speedup of 
5 to 10 times: 

DO 63 I = 1,L 
DO 62 J = 1,M 

62 C(I,J) = 
DO 63 K = 1,N 
DO 63 J = 1,M 

63 C(I,J) = C(I,J) + A(I,K) * B(K,J) 

In many similar situations, although the result is not going into a 
subscripted variable but into a scalar temporary you can reorder the 
loops and store the results as a vector temporary instead of as a scalar 
temporary. 
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The next example shows several stages in the speed-up process. Case 2 
is more than 50% faster than Case 1 and Case 3 is almost four times as 
fast as Case 1. 

CASE 1 

Q ■ 0.0 

DO 64 K » 1,996,5 

Q ■ Q + 2(K) * X{K) + Z(K + 1) * X(K + 1} 
$ + Z(K +2) * X (K + 2) + Z(K + 3) * X(K +3) 
$ + Z(K + 4) * X(K + 4) 

64 CONTINUE 

In this original case, the loop was quintupled, presumably to cut loop 
overhead or allow greater overlap of operations. 

CASE 2 

DO 65 K » 1,996,5 

TP(K) » 2(K) * X(K) + Z(K + 1) * X(K + 1) 
$ * Z(K + 2) * Z(K + 2) + Z(K + 3) * X{K + 3) 

$ + 2(K + 4) * X(K + 4) 

65 CONTINUE 
Q « 0.0 

DO 66 K » 1,996,5 
Q - Q * TP(K) 

66 CONTINUE 
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CASE 3 

Q = SDOT(1000,Z,1,X,1) 

Here, SDOT is the BLA single-precision dot function (see Appendix A or 
CRI publication 2240204). 

As an aid to remembering the calling sequences for the basic linear 
algebra functions, the first argument is the vector length, the 
remaining arguments are in pairs : a vector operand followed by its 
increment in memory. 

Thus, if A and B are declared DIMENSION A(M r N),B(N,L) and you want to 
compute the dot product of the Ith row of A with the Jth column of B, 
uses 

AB = SDOT(N,A(I,l),M,B(l,J),l) 

where N = the vector length = number of elements in each vector operand, 
A(I,1) and B(1,J) are the starting locations in memory of the operands, 
M = memory increment of the first operand vector and 1 = memory 
increment of the second operand vector 

Appendix A lists the BLA subroutines briefly as well as a few other 
useful routines that are in $SCILIB. 

A planned enhancement to CFT is to perform scalar operations for 
individual statements in otherwise vector izable loops. An industrious 
programmer can achieve this now by using VFUNCTIONSs 

CDIR5 VPUNCTION ... 

which tells the compiler of external non-libray vector functions. For 
CFT 1.06, these can be written only in CAL. The CFT manual sections 5 
and Appendix F provide the information necessary to link such routines 
to FORTRAN programs. 
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SECTION 7 

REMOVING IF STATEMENTS AND USING 

BUILT-IN FUNCTIONS 

CFT 1.06 does not vectorize code blocks that contain IF statements. 
Many types of loops with IF statements are not hopeless, however. 
Several things can be done depending on the structure of the code. CFT 
will eventually vectorize many of these for you but in the meantime you 
can help by using some of the built-in functions such as AMAX1, ABS, 
CVMGT, CVMGZ, ...etc. (See CFT manual appendixes B and C) . For examples 

DO 71 I » 1,1000 

IF(A(I) .LT. 0.) A(I) = 0. 

71 B(I) = SORT (A(I)) ... 

which can be converted to: 

DO 72 I = 1,1000 
A(I) * AMAX1(A(I),0.) 

72 B(I) = SQRT (A(I)) ... 

The DO 71 loop doesn't vectorize nowj the DO 72 loop does. 

All the built-in arithmetic functions in FORTRAN (in $FTLIB) have both 
vector and scalar versions* the compiler calls the vector version for 
vector izable loops*. The vector merge operations, CVMG*, are typeless 
functions that allow you to merge the results of different vector 
computations such as the following s 

DO 74 I * 1,1000 
IF(A(I) .LT. 0.) GOTO 73 
B(I) = A(I) + D(I) ... 
GOTO 74 

73 B(I) = A(I) * E(I) ... 

74 CONTINUE 

This can be rewritten to vectorize as 

DO 75 I = 1,1000 

75 B(I) = CVMGT (A(I) * E(I) ..., (A(I) + D(I) ..., A(I).LT.O) 



♦Some are actually pseudo vector routines; they allow the loop to 
vectorize but are performed in scalar mode. 
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The mnemonic for the CVMG* group of functions is that the last letter of 
the name is the condition on which the first argument is used. Since 
these functions are Boolean, they can be used with integer or floating 
operands and results and in scalar loops as well as in vector loops. 
Thus, if you are not sure that you are computing the value of B 
correctly, you can put a print statement in the loop, which causes it to 
be scalar, and still obtain the same results, albeit much slower than 
before. 

Table 2 lists the merge functions and some of the other ones that you 
may want in similar situations. 



Table 2. 
FUNCTION NAME 
AMAX0(X 1 ,X 2 ...) 
MOXMI^Xj...) 

MAX0(X 1 ,x 2 ...) 

MAXl{I lf I 2 ...) 
CVMGT(X,Y,L) 

CVMGZ(X,Y,Z) 



Some typical in line CPT functions. 

RESULT TYPE ARGUMENT TYPES OPERATION 

Real Real Largest X]_ 

Heal Integer 



Integer 



Integer 

Boolean 
(single word) 

Boolean 
(single word) 



Real 



Integer 

Boolean 
(single word) 

Boolean 
(single word) 



Largest 1^, 
floated 



Largest X],, 
truncated 



Largest lj_ 

X if L True, 
otherwise Y 

X if Z is zero, 
Y if Z is nonzero 



Another technique that works in some cases is inverting the order of 
loops so that the IP statements are in the outer loops rather than in 
the inner loops. Also, if the purpose of the. IP test is to separate an 
exceptional case from other cases and if the computation is extensive, 
it may be worthwhile to write a loop to do the testing and to write a 
vectorizing loop -for the computations. 

Here are some more examples: 

Y(I) - 1.0 

IP(X(I).EQ.O.) GOTO 76 
Y(I) » 1.0/X(I) 
76 CONTINUE 

Change this tos 

Y(I) » 1.0/CVMGZ (l.,X(I),X(I)) ... 
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which allows a loop containing it to vectorize and yet does not cause a 
divide fault. Here, CVMGZ selects 1. when X(I) = 0; otherwise it 
selects X(I). Alternatively, if this exceptional condition only occurs 
in cases when the result is not used, you can surround the loop 
containing it with CALL CLEARFI and CALL SETFI to turn the floating 
point interrupt off and then on again. This allows generation of an 
infinity without interrputing the program. 

The next example illustrates loop reordering and IP statement removal: 

CASE 1 CASE 2 



77 



78 



79 



DO 77 K - 1,3 

FR(K) = 

CONTINUE 

DO 79 JA - 1,500 

IF (JA .EQ IA) GOTO 79 

DS - 

DO 78 K = 1,3 

A(K) = RS(K,IA) - RS(K,JA) 

DS = DS + A(K) ** 2 

CONTINUE 

DS = SQRT(DS) 

IF (DS .GT. RAD) GOTO 79 



C 
710 



CONTINUE 



711 
712 



713 



DO 710 JA = 1,500 

DSV(JA) = 

DSV IS A VECTOR OF DS VALUES 

CONTINUE 

DO 712 K » 1,3 

DO 711 JA = 1,500 

AM(K,JA) = RS(K,IA) - RS(K,JA) 

DSV(JA) + AM(K,JA) ** 2 

CONTINUE 

DO 713 JA = 1,500 

DSV(JA) = SQRT(DSV(JA)) 

CONTINUE 

DO 714 JA = 1,500 

IF (DSV(JA) .GT. RAD) GOTO 714 



714 CONTINUE 



Some of the loops in Case 1 vectorize but this vector length is only 3. 
In Case 2, the inner loops vectorize with a vector length of 500. In 
particular, the 713 loop uses a vector square root saving a great deal 
of time. As it turns out, in the "real life" example, most of the time 
DS was greater than RAD (last statement shown in the loop) so the rest 
of the loop did not need any work. Even though additional vector ization 
could be done, it would not have been very productive. With the change 
illustrated, the entire kernel ran more than four times faster than the 
original. 
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APPENDIX A 

$SCILIB SUBROUTINES 

This appendix summarizes the scientific library subroutines. 

For a current and more complete description of these functions, refer to 

the Library Subroutine Reference Manual, CRI Publication 2240014. 

LEGEND: 

N Vector length 

X,Y Floating point vectors 

IX, IY Increments in memory of floating point vectors 

C,D Complex vectors 

IC,ID Increments in memory of complex vectors 

NB Number of bits per word selected for PACK/UNPACK 

NW Number of words in unpacked array. 



Name (Parameters) 
ISAMAX(N,X,IX) 

ICAMAX(N,C,IC) 

SASUM(N,X,IX) 
SCASUM(N,C,IC) 

SAXPY(N,X,IX,Y,IY) 
CAXPY(N,C,IC,D,ID) 
SCOPY(N,X,IX,Y,IY) 
CCOPY(N,C,IC,D,ID) 



Type 

Integer function 

Integer function 

Real function 
Real function 

Subroutine 
Subroutine 
Subroutine 
Subroutine 



Purpose 

Index to real array element 
having maximum absolute value 

Index to complex array 
element having maximum 
modulus . 

Sums the absolute value of a 
real array 

Sums the absolute values of 
real and imaginary parts of 
complex array 

Performs vector computations 
y*-ax+y on real arrays, x,y. 

Performs vector computation 
y<-ax+y in complex arrays x,y. 

Copies real array x into real 
array y. 

Copies complex array c into 
complex array d. 
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Name ( Parameters ) 
SDOT(N,X,IX,Y,IY) 

CDOTC(N,C,IC,D f ID) 

CDOTU(N,C,IC,D,ID) 

SNRM2(N,X,IX,Y,IY) 

SOTCM2(N,C,IC) 

SROT(N,X,IX,Y,IY) 

SROTG(...) 

SROTM(...) 

SROTMG(...) 

SSCAL(N,A,X,IX) 

CSSCAL(N,A,C,IC) 

CSCAL(N,A,C,IC) 

SSWAP(N,X,IX,Y,IY) 
CSWAP(N,C,IC,D,ID) 
MXMA(...) 

CFFT2 (...) 

RCFFT2 (...) 

CRFFT2 (...) 



Type 

Seal function 

Complex function 

Complex function 

Real function 

Real function 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 

Subroutine 
Subroutine 
Subroutine 

Subroutine 

Subroutine 
Subroutine 



Purpose 

DOT product of real arrays 
x,y. 

DOT product of complex arrays 
c,d. 

DOT product of complex arrays 
c,d. 

Euclidean norm of real array 
x. 

Euclidean norm of complex 
array c. 

Performs Givens 
transformation on real arrays 
x,y. 

Calculates parameters for 
SROTX. 

Modified Givens 
transformation. 

Sets up parameters for 
modified Givens. 

Rescales real array x: 
x-*-ax, a real. 

Rescales complex array c: 
c-*ac, with a real. 

Rescales complex array c: 
c-#-ac, with a. complex. 

Swaps real arrays x,y. 

Swaps complex arrays c,d. 

Completely general matrix 
multiply. 

Fourier transforms binary 
radix complex array. 

Fourier transforms binary 
radix real to complex. 

Fourier transforms binary 
radix complex array to real. 
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Name ( Parameter s ) 
PACK(?,NB,U,NW) 



MINV (...) 



CSUM(N,C,IC) 



FILTERG (...) 
FILTERS (...) 
OPFLIT (...) 



Type 
Subroutine 



UNPACK (P,NB,U,NW) Subroutine 



Subroutine 



SSUM(N,C,IC,D,ID) Real function 



Complex function 



CROT(N r X,IX,Y,IY) Subroutine 



CROTG(N,C,IC,D,ID) Subroutine 



Subroutine 



Subroutine 



Subroutine 



Purpose 

Packs power of 2 bit partial 
word lists. 

Unpacks list into power of 2 
bit partial words. 

Returns solution of general 
linear equation set, matrix 
inverse optional. 

Sums the elements of a real 
array. 

Sums the elements of a 
complex array. 

Applies complex Givens 
rotation. 

Sets up rotational parameters 
for CRGT. 

Performs general fitering and 
auto-correlation . 

Calculates symmetric filter 
coefficient. 

Wiener-Levinson equation 
solver . 
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APPENDIX B 

FORTRAN DIALECTICAL DIFFERENCES 

As an aid for conversions, this appendix contains a number of tables 
that compare the FORTRAN compiler dialects for several manufacturers. 
The following tables are included. 

1. Hardware dependencies 

2. Coding features 

3. Declaratives and ordering 

4. Names and variables 

5. Constants, literals, and strings 

6. Arithmetic and expressions 

7. Branching and control statements 

8. Input/output formatting 

9. Subroutines and functions 

10. Instrinsic or inline functions 

11. External functions. ______ 

All of these tables are based -on a scan of manuals and are thus prone to 
error and very prone to omissions. Further, as manufacturers bring 
their FORTRAN dialects into conformance with FORTRAN X3. 9-1978, some of 
these differences can be expected to disappear. The tables also reflect 
1975-1979 versions of FORTRAN. In particular, while CDC FTN 5 is to be 
an ANSI-1978 version here. CDC means either FTN 4 or RUN FORTRAN. IBM 
means FORTRAN H. Univac means either FORTRAN V or ASCII FORTRAN. ICL 
means either 1900 or 2900 FORTRAN. The tables do not generally include 
any FORTRAN statement, syntax, or pecularity where CFT is believed to be 
at least as general as the other dialects shown. "Unusual" and its 
synonyms mean "non-CFT." 
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»o 

M 

*. 

a 

o 

-4 



I 



USB CQ 

Nunbet of character* per word, B.I . 
bit* per character 

Character Manipulation in CVT 

cannot Involve *>ore than a 

character* par word. Card 

inagea require IDA* foraat 

i.e., 10 words to atore lu^i. 

True/false repiaaentatlon. Negative, 

Equivalence* to logicale or Positive 
binary setting* of logical* 

Collationi lettara before or 
after number, each contiguous. 

tearing for character sequences! 

lattera greater than nuaibara 

internally for CFT. 

Internal Character Code ASCII 



Binary conatant* 

«.... Mini I followed by a 
atring of digits, etc. 



TABLE 1 MARPHAm DEPENDENCIES 

SHSIrSi HSLJl £pc jut 

Unspecified Unapecifled 10, i 4,0 



Unspecified Unapecifled FTNi Neg, 

HUH i Pos.O 



UMIVAC 

Al «,» 
Vl «,« 



Truei tSB-1 
raiser LSB«0 



l»00i (,i 



After, yea Unapecifled Unapecifled Before. Yea Before, No After, ¥*• Before, No 



Dlaplay Code EBCDIC 

Not allowed Nat allowed FTM...B I... 

BUN 0.. . 
0B...B 



A. ASCII (* Blt| 
Virield Data It Bltl 

O... l... 



HONEKHEIX 



True / 
false - 



O 

ro 

o 

-4 



Comment indicators 
(in column 1) 



C£T 
C,« 



TABLE 2 CODING FEATURES 
ANSI-66 ANSI-77 CDC IBM 

C C,* C|«,$ C 



UNIVAC IC 

Ci« in line C 
for rest of 
line 



HONEYWELL 

c,« 



Continuation allowed 
(line lengthi cords) 

Multiple statements per 
line, separator isi 



IS 



Not allowed 



Not allowed 



Not allowed 



19 



Multiple replacement statement Not allowed Not allowed Not allowed Yes 



1320 chars 
total 



19 



Not allowed t for comment Not allowed 



Not allowed Not allowed 1900 i Yes 

2900* No 



Not allowed 



W 
I 



PROGRAM statement 



Pseudo functions (functions 
usable on-left or right of >}| 

must be recoded without 

pseudo functions. 

Partially reserved words 
some names may be illegal* 
O. ..means names beginning 
with O. 

Unusual characters allowed 



Defines prog- 
ram name 

Not allowed 



FORMAT 
END 



Not allowed 



Defines Defines files 
program name 



Not allowed Not allowed Not allowed BITS 

SUBSTR 



FTNi FORMAT 

FUNCTION 

RUNi CALL, END 
0...,etc. 



t in CALL 
• in MS I/O 
S in name 



Name only 
allowed 

Not allowed 



i in CALL and 
for concaten- 
ation 

' in MS I/O 
t in 



» in CALL 



\ for " 

t for continu- 
ation 
• in MS I/O 

I for statement 
terminator 



M 

a 

M 

o 

-4 



03 



Hen 

Pita assignable In declarative 

Precidons of vaiUblo (by *n) 
In type statement 

■TYPE" type allowed 

Hon err type* 

DAT*)...) allowed 

Arlthaetic statement (unction* 
can be othac than altac 
declarative before executable 
atatcaenta 

Parenthesis requited in 
PARAMETER 

IMPLICIT Bust precede all other 
declarative*, axecutablas 

Result* of DAT* statesent type 

■iaaatch. 

for RUN and UN I VAC rOHTHAM V 
Bust correct typea of cons- 
tant* in DATA statements. 

COMMON irregularities) 

Initial common block lengths 
nust be as long a* aver 
required. Numbered common 
Bust be changed to naaed. 



£EX 


ANSl-t* 


ANS1-77 


epe 


IBM 


UNIVAC 


l£k 


IIOUKYHELL 


No 


No 


No 


No 


Yea 


Ves 


Vea 


Yes 


Yes 


Not allowed 


Not allowed 


Not allowed 


yes 




yea 




No 


No 


No 


Yes 


No 


ye* 




Ye* 


Nona 


None 


Character 


ECS 




Character 


Character 


Character 
Abnoraal 


No 


No 


No 


yea 


No 


No 


No 




No 


Not allowed 


No 


ITNt No 
RUNr Yes 











Hot allowed ye* 



Yes Not allowed Vea 



Converted Not allowed Converted 

except for except for 

logical. logical 

complex character 



Yea 



RUNi ignored 



Numbered None 
COMMON blocks 
changing 
length* of 

COMMON 



hi Convert 
Vr Ignore 



None 



to 
M 
■U 
O 

to 

O 
-J 



Maximum number of characters 
in names 



* in 



Lower case characters ' 



ANSI-66 
6 



TABLE 4 NAMES OF VARIABLES 
AHSI-77 CDC IBM 

6 7 6 



Not allowed Not allowed Not allowed Not allowed Yes 



UNI VAC 
6 



32 



After initial Not allowed Not allowed 
letter 



Not allowed Not allowed Not allowed Not allowed Not allowed Treated as Not allowed Treated as 

upper case upper case 



to 

I 



Item CFT 

Non CFT types None 

Quad precision change to 
double and double to 
single, probably. 



Nonstandard length identifiers 
(such as READ'D for REAL'S) 
Double precision and Quad 
precision functions must be | 
changed to correspond to 
actual argument types where 
these are changed. 

Unusual constant forms 



Alternate character codes None 
(See also third hardware item) 



TABLE 5 CONSTANTS, LITERALS. AND STRINGS 
ANSI-66 ANSI -7 7 CDC IBM 

None Character Not allowed 



None 



Quad pre- 
cision 
double- 
complex 



UNI VAC 

Quad pre- 
cision 
double- 
complex 
character 

Not allowed 



ICL 

Quadruple 

double-comp. 

character. 



I,J,K,L,M,H, 
E,R,Q,D. 



Not allowed 


Not allowed 


FTNi ...B 


Z... for Hex 


0... 


1900 lO... 


O. .. 






RIINt ...B 


Data init- 


for octal 


for octal 


for octal 






or 0... 


ialisation 




2900iZ... 








for octal 


only. 




for Hex 





Not Specified Not Specified Display code EBCDIC 



Vi Field data 



o 

M 



TABI.f j ARlTHHETfC AMO EXPRESSIONS 



1US £12 

i/j/x - float ii/ji/x *.. 

foe CDC Subscript (t| can 
be added in arithmetic 
statements for subscripted 
variables If ona It not present. 



ANSj-sf 
Nut al lowed 



V«* 



£S£ 

Ha 



lfiS 
v«« 



univac 



i£L IIOHEHWEIL 

yes 



U A*-B ok land other similar 
on* ft 


Not allowed 


Not allowed 


Not allowed 


Not allowed 


Nat allowed 


At No 
Vi «•• 


Not allowed 


Subscripts required foe 
multi-dimensional arrays 




At least on* 


All 


All 


Nan* 


All 


All 




Man-integral subscripts 




Nat allowed 


Not allowed 


Not allowed 


mil Yes 

MINI No 


Vee ( teal 


Yes 


Veai real 01 



to 

I 



lisa sa 

Results of out-of-range con- Fall through 
puted CO TO. 

Arithmetic IF can have missing Nat allowed 
statement labels. 

Arithmetic ir can have complex * Not allowed 
argument 



00 I I • 10.1 executes No times 

This 1* a difficult error to 
uon I tot unless IF's arc Inserted 
for alt DO statements but it can 
be fixed with 0N»J on the err statement 
Complex relational operator. .HE. and 
Complex .eq. only 



Labels on non-executable 
non-rOKHAT statements 



Nat allowed 



TABLE 7 BRAMCIHNC AND CONTROL STATEMENTS 
*"§1-X1 m\-ll CDC 1B»S WIVAC |CL HONEYWELL 

Not allowed Fall through ratal error Fall through rail through rail through Fatal error 



Not allowed Nat allowed 



Nut allowed Ha times 



Not allowed 



(lot allowed 



.HE. and 
.EQ only 



Once 



Otti only 
real part 
tested except 
.NS, and .EQ. 



labels allowedi 
Reference not 
allowed 



Once 



Not allowed 



Not allowed 



yes, GOTO cen IxDut yes yes 
also lioa, Ho 



Mot allowed Not allowed Yesi only Not allowed Not allowed 

real part 
teated 



A* if Once 

DO 1 1-10,1,-1 



Not allowed 
can and DO. 
Vi yea 



A ■ FOHMAT 



Not allowed 



to 
to 

o 

M 
O 
•vl 



CD 
-J 



Item 
Fiee format I/O 

End-of-file, erior checks 

TAPE, I/O TAPE, etc. allowed 
with R/W statements 

Random mass storage I/O 
Other I/O statements 

None CFT FORMAT specs 

Format paren nesting maximum 
Encode/Decode pecularlties 



CFT 

Not allowed 



END-, ERR-, 
EOF, UNIT, 
I EOF 



Not allowed 



ANSI -6 6 
Not allowed 



GETPOS 
SETPOS 



Not allowed 



Not allowed 



None 



Not allowed 



Not allowed 



TABU a INPUT/OUTPUT/FORMATTING 



ANSI -77 



CDC 



* for state- * (or state- 
ment label In stent label 
I/O statement 



END-, ERR- 
IOSTAT- 



Not allowed 



FTNl EOF 
IOCHEX 
RUNl EOF.U 
IOCHEX, I) 

FTNl No 
RUNl Yes 



READI..REC-) READMS/ 
WRITE (..NEC-) WRITEMS etc. 



OPEN 

CLOSE 

INQUIRE 

EH.dEe.etc. 



READEC 
HRITEC 
MOV LEV 

Ew.dEe, etc. 



FTNl 
RUNl 



READ i WRITE 
to character 
strings 



• for state- 
ment label 



Not allowed 



FIND 
(a'rt etc. 

READ(U,ID-w) 
AD(UV) 
DEFINF, etc. 

Qw.d 

E,F ok Cor 

Integers! G 

ok integer, 

logical 



REREAD 



ICL 



HONEYWELL 



• for state- * for state- Not allowed 
ment label ment label 



END-, ERR- 



END-, ERR* 
IOSTAT- 



Ai 

Vi 



No 
Yes 



FIND 
(a'r) etc. 



Not allowed 



READ (u,fl, clause) 

list, etc. Like IBM 



Iw.d, Jw, +S Qw.d, V, Hw.c 
Ew.dEe, etc. C ok for logical 
or integer 



5+ 



Char count 
optional! no. 
of chars 
converted 
available. 
ERR- allowed 



Char count 
not Included. 
ERR- allowed. 



O 

to 
O 
~4 



M 
I 

00 



ltea 

In pia^umi RfcTOKN-STOr 
In subroutine! END • return 



Alternate latum* syntax 



££1 

No 
Yea 



mM* 



i 
i 

TABLE a, SUBROUTINES AMD FUNCTIONS 
AH8I-17 CDC |BH 



No 



FTNi Yea/Yes Yes 
HUH I Yea/No No 



UHIVAC 

'Internal* 

subroutines 
•llowtd 



ICJf IIONEYWe 

STOP required 



• before Not allowed 
label in CALL 

* in aubr. 



ENTRY has own calling sequence 

and usable as • (unction 
CSC ENTRY stateaente aust have 
cor tact calling sequences added. 

END statement required 

Ouawy argument in slashes 
• call by addies* 

*4 subprogram naae" allowed 
Overlay syntax 



DEFINE used In arithmetic 
atateaent (unctions 



Not allowed 



• before ' I-,-,... I » befoie 
label in CALL RETURNS (-,-, label in CALL 

* in subr. . ..) i la subr. 



No 



Yes 




yea 


yes 




Illegal 


Illegal 


Illegal 


Illegal 


(as 


Not allowed 


Not allowed 


Not allowed 


Not allowed 


Hi yes 

Gi Ho 


U)R coaaanda 
ROOT, POVL, 
SOW. 
CALL overlay 


Unspecified 


Unspecified 


OVEMAY|...| 




Not allowed 


Not allowed 


Hot allowed 


Not allowed 


Allowed 



Ait or $ a before rOKTRAN 11 

before label label in CALL syntax used 

in CALL. 

Vi Indexing 

Is absolute, 

not relative. 



Veai aaae type Yes 
as (unction 


Yea 


Ho Ves 


Ye* 


Yea Ves 


Illegal 


In EXTERNAL Not allowed 
only 


Not allowed 


Use BANK 
stateaent 





Allowed 



Not allowed 



TABLE 10 INTRINSIC OR INLINE FUNCTIONS 



to 

O 

O 
-J 



I tea 
Memory management 
I/O 
Compiler directives 

Assembler code with FORTRAN 
Code insertion 



Actual functions that are 
unusual 



CFT 
None 



COIRS i LIST 
NOLI ST, EJECT 
etc. 



Not allowed 
Via UPDATE 

None 



ANSI -66 - 

None 

Not allowed 

Not specified 

Not allowed 
Not allowed 

None 



ANSI-77 



Not allowed 



CDC 

ECS 

Unload 

PTNi No 
RUNi between 
routines, 
. in column 1 

I DENT 

Via UPDATE 



IBM 
None 
Unload 
Generic 

Not allowed 



SHIFT All double 
RUNi FORTRAN precision 
II syntax Q prefixes 

on functions 



UNIVAC 
BANK 

Compiler 

Not allowed 

Include 
Delete 

All double 
precision 



HONEYWELL 



Double precision + 
D,E,Q,R,I,J,K 
pre fixes i fatal 
functions i many 
additional such 
as trig with 
degrees . 



CD 
I 
VO 



AMSI-66 



TABLE 11 EXTERNAL FUNCTIONS 
ANSI-77 CDC IBM 



HONEYWELL 



General differences 



Specific functions 



None 



All double 
precision 



FORTRAN II 

functions i 

LOCF.ABSr, etc 

SLITE 

DISPLA 

TIKE 

I0CHEC 



All double 
precision 

GAMMA 
LGAMMA 



All double 
precision 

Time, date 



APPENDIX C 

TRACEBACK 

There is a FORTRAN library routine TRBK (file) that you can call to 
determine how the program reached the current routine. If there is no 
argument, the trace is printed in the logfile, if file is specified, 
•$OUT' perhaps, the traceback goes to that file. 



When a task terminates abnormally, TRBK is automatically called so a 
trace of the subroutine calling tree to the point of error is provided. 
To make this as useful as possible, turn on the block listing from CFT: 

CFT,ON=B, ... 

and the load map on the loader card: LDR,MAP,... so that the addresses 
provided can be easily localized in the FORTRAN source. Below is an 
annotated abort message to illustrate using the trace information. 

ABO 5 3 - FLOATING POINT ERROR 

AB000 - JOB STEP ABORTED. P = 0144760A 

_ ******** WAS CALLED BY SOLVE AT LOCATION 0144760A 

- SOLVE WAS CALLED BY EQNS AT LOCATION 00 1234 1C 

- EQNS WAS CALLED BY $MAIN AT LOCATION 0003411A 

The cause of termination is a floating point overflow from the first 
line of the abort message. Another common diagnostic is "OPERAND RANGE 
ERROR", which occurs when an attempt is made to reference some part of 
memory outside your user area, most likely a wild index. If all output 
is lost, one possible cause is using block common with a DIMENSION of 1, 
COMMOM X(l), but storing into Xs with subscripts much larger than 1. 
(This may overwrite I/O buffers and tables.) 

The last instruction being executed at the time of abort was at location 
00144760 A. For the FORTRAN programmer, discard the parcel, A in this 
case, leaving the OCTAL address of the instruction word. The actual 
error probably occurred on earlier instruction. The load map ADDRESS 
gives the base address of all routines. In this case, find the base 
address of SOLVE, where the error occurred (from the third line of the 
abort message) . Subtract the base address from the absolute P-counter 
address given to find the relative address in SOLVE. (Remember to 
subtract in OCTAL J I ! ) 



2240207 C-l 



Because the BLOCK (ON-B) listing was turned on for CFT, you will have a 
list of all blocks which will allow you to find the FORTRAN code block 
where the abort occurred: 

P-Counter 0144760 (A) 

Loader ADDRESS of SOLVE - 0104700 



Relative address in SOLVE 0040060 

Partial block listing for SOLVE: 



SOLVE VECTOR BLOCK BEGINS AT SEQ. NO. 1372, P=40035B 
SOLVE BLOCK BEGINS AT SEQ. NO. 1380, P-40055D 
SOLVE BLOCK BEGINS AT SEQ NO. 1401, P-40077A 



Because the relative error address is 40060, which is between. 40055 and 
40077. The bomb occurred for a FORTRAN statement between numbers 1380 
and 1401. The listing should now help you find the probable source of 
the error quickly. 

Similarly, the point at which SOLVE was called by EQNS can be 
determined. The absolute address of the CALL was 0012341C and EQNS was 
called by the main program at absolute locations 0003411A 
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Compiler directive lines 
through 5 and any of the 

DIRECTIVE 

EJECT 

LIST 

NOLIST 

CODE 

NOCODE 

VECTOR 

NOVECTOR 

IVDEP 

INT24 



FLOW 

NOPLOW 

SHED 

NOSCH 

VFUNCTION 

BOUNDS 



APPENDIX D 

COMPILER DIRECTIVES 

begin with characters CDIR$ in columns 1 
directives listed below in columns 7 through 72. 

FUNCTION 

Ejects to top of next page. 

Resumes listable output. 

Suppresses production of listable output. 

Produces code list. 

Suppresses production of CFT-generated code 
lists. 

Enables vector ization of inner DO- loops. 

Suppresses vectorization'of inner DO- loops. 

Ignores vector dependencies in the next 
DO- loops . 

Identifies listed variables and arrays as 
24-bit integers, equivalent to INTEGER *2 
declarative. 

Enables f lowtrace . 

Disables f lowtrace. 

Enables the scheduler. 

Disables the scheduler. 

Identifies external vector functions. 

Checks array references for out-of-hand 
subscripts. 
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APPENDIX E 








1 






CHARACTER SETS 








CHAR 


ASCII 


HEX 


ASCII f 
CARD CODE I 


CHAR 


ASCII 


HEX 


ASCII 
CARD CODE 


NUL 


000 


00 


12-0-9-8-1 


9 


100 


40 


8-4 


soh 


001 


01 


12-9-1 


A 


101 


41 


12-1 


STX 


002 


02 


12-9-2 


B 


102 


42 


12-2 


ETX 


003 


03 


12-9-3 


C 


103 


4 3 


12-3 


eot 


004 


04 


9-7 


D 


104 


44 


12-4 


ENQ 


005 


05 


0-9-8-5 


E 


105 


' C ) 


12-5 


ACK 


006 


06 


0-9-8-6 


F 


106 


46 


12-6 


BEL 


007 


07 


0-9-8-7 


G 


107 


47 


12 -.7 


BS 


010 


08 


11-9-6 


H 


110 


48 


12-8 


HT 


Oil 


09 


12-9-5 


I 


111 


4 9 


12-9 


LF 


012 


OA 


0-9-5 


J 


112 


4 A 


11-1 


VT 


013 


OB 


12-9-8-3 


K 


113 


4B 


11-2 


FF 


014 


OC ' 


12-9-8-4 


L 


114 


4C 


11-3 


CR 


015 


OD 


12-9-8-5 


M 


115 


4D 


11-4 


SO 


016 


OE 


12-9-8-6 


N 


116 


4C 


11-5 


SI 


017 


OF 


12-9-8-7 





117 


4F 


11-6 


DLE 


020 


10 


12-11-9-8-1 


P 


120 


5 


11-7 


DC1 


021 


11 


11-9-1 


Q 


121 


51 


11-8 


DC2 


022 


12 


11-9-2 


R 


122 


52 


11-9 


DC3 


023 


13 


11-9-3 


S 


123 


5 3 


0-2 


DC4 


024 


14 


4-8-9 


T 


124 


54 


0-3 


NAK 


02S 


IS 


9-8-5 


U 


125 


5 5 


0-4 


SYN 


026 


16 


9-2 


V 


126 


5 6 


0-5 


ETB 


027 


17 


0-9-6 


w 


127 


5 7 


0-6 


G\N 


030 


18 


11-9-8 


X 


130 


58 


0-7 


EM 


031 


19 


11-9-8-1 


Y 


131 


5 9 


0-8 


SUB 


032 


1A 


9-8-7 


•7 
L. 


132 


5A 


0-9 


F.SC 


033 


IB 


0-9-7 


[ 


133 


SB 


12-8-2 


FS 


034 


1C 


11-9-8-4 


\ 


134 


SC 


0-8-2 


GS 


03S 


ID 


11-9-8-5 


] 


135 


SD 


11-8-2 


RS 


036 


IE 


11-9-8-6 


" 


136 


si; 


11-8-7 


US 


037 


IF 


11-9-8-7 


_ 


137 


S I'- 


0-8-5 


Space 


040 


20 


None 


» 


140 


ll 


8-1 
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CHAR 


ASCII 


HEX ASCII 


CHAR 


ASCII 


HEX 


ASCII 








CARD CODE 








CARD CODE 


t 


041 


:i 


12-8-7 


a. 


141 


61 


12-0-1 




042 


■> i 


3-7 


b 


142 


<>2 


12-0-2 




043 


15 


8-3 


c 


143 


3 


12-0-3 


044 


24 


11-8-3 


d 


144 


(>4 . 


12-0-4 


t 


04S 


2S 


0-8-4 


e 


145 


6 5 


12-0-5 


a 


046 


Z<3 


12 


f 


146 


66 


12-0-6 




047 


77 


8-5 


S 


147 


6 7 


12-0-7 


( 


OSO 


28 


12-8-S 


h 


ISO 


OS 


12-0-8 


) 


0S1 


29 


11-8-5 


i 


1S1 


6 9 


12-0-9 


* 


OSZ 


:a 


11-8-4 


J 


152 


6 A 


12-11-1 


♦ 


053 


23 


12-8-6 


k 


153 


li B 


12-U-2 


» 


0S4 


2C 


0-8-3 


1 


1S4 


oc 


12-11-3 




0S5 


2D 


11 


n 


155 


on 


12-11-4 


• 


056 


2E 


12-8-3 


a 


156 


61: 


12-11-S 


/ 


0S7 


2F 


0-1 





157 


OP 


12-11-6 





060 


30 





P 


160 


"(1 


12-U-7 


1 


061 


31 


1 


q 


161 


"I 


12-11-8 


2 


062 


32 


2 


r 


162 


— J 


12-11-9 


3 


063 


33 


3 


s 


163 


"3 


U-0-2 


4 


064 


34 


4 


t 


164 


74 


11-0-3 


5 


065 


35 


5 


u 


165 


75 


11-0-4 


6 


066 


36 


6 


V 


166 


76 


11-0-5 


7 


067 


37 


7 


w 


167 


7- 


11-0-6 


3 


070 


38 


8 


X 


170 


73 


11-0-7 


9 


071 


39 


9 


y 


171 


7 9 


11-0-8 


: 


072 


3 A 


8-2 


z 


172 


"A 


11-0-9 


? 


073 


3B 


11-8-6 


{ 


173 


7B 


12-0 


< 


074 


3C 


12-8-4 


1 

1 


174 


7C 


12-11 


s 


07S 


3D 


8-6 


I 


175 


:n 


11-0 


> 


076 


3H 


0-8-6 


1. 


176 


7i: 


11-0-1- 


7 


077 


3F 


0-8-7 I 


DEL 


177 


7F 


12'-9-7 
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